Amiga Format CD 42

home *** CD-ROM | disk | FTP | other *** search

/ Amiga Format CD 42 / Amiga Format AFCD42 (Issue 126, Aug 1999).iso / -serious- / programming / other / jikes / src / unzip.h < prev next >

Wrap

C/C++ Source or Header | 1999-05-14 | 17KB | 374 lines

// $Id: unzip.h,v 1.2 1999/01/13 21:48:42 shields Exp $ #ifndef unzip_INCLUDED #define unzip_INCLUDED #include "config.h" // // NOTE: Jikes incorporates compression code from the Info-ZIP // group. There are no extra charges or costs due to the use of // this code, and the original compression sources are freely // available from http://www.cdrom/com/pub/infozip/ or // ftp://ftp.cdrom.com/pub/infozip/ on the Internet. // The sole use by Jikes of this compression code is contained in the // files unzip.h and unzip.cpp, which are based on Info-ZIP's inflate.c and // associated header files. // // // You can do whatever you like with this source file, though I would // prefer that if you modify it and redistribute it that you include // comments to that effect with your name and the date. Thank you. // The abbreviated History list below includes the work of the // following: // M. Adler, G. Roelofs, J-l. Failly, J. Bush, C. Ghisler, A. Verheijen, // P. Kienitz, C. Spieler, S. Maxwell, J. Altman // Only the first and last entries from the original inflate.c are // reproduced here. // // // History: // vers date who what // ---- --------- -------------- ------------------------------------ // a ~~ Feb 92 M. Adler used full (large, one-step) lookup table // ... // c16 20 Apr 97 J. Altman added memzero(v[]) in huft_build() // #define DFUNZIP /* needed for unzip compilation*/ #include <stdio.h> #include <stdlib.h> #include <string.h> // // inflate.c -- put in the public domain by Mark Adler // version c15c, 28 March 1997 // // // Inflate deflated (PKZIP's method 8 compressed) data. The compression // method searches for as much of the current string of bytes (up to a // length of 258) in the previous 32K bytes. If it doesn't find any // matches (of at least length 3), it codes the next byte. Otherwise, it // codes the length of the matched string and its distance backwards from // the current position. There is a single Huffman code that codes both // single bytes (called "literals") and match lengths. A second Huffman // code codes the distance information, which follows a length code. Each // length or distance code actually represents a base value and a number // of "extra" (sometimes zero) bits to get to add to the base value. At // the end of each deflated block is a special end-of-block (EOB) literal/ // length code. The decoding process is basically: get a literal/length // code; if EOB then done; if a literal, emit the decoded byte; if a // length then get the distance and emit the referred-to bytes from the // sliding window of previously emitted data. // There are (currently) three kinds of inflate blocks: stored, fixed, and // dynamic. The compressor outputs a chunk of data at a time and decides // which method to use on a chunk-by-chunk basis. A chunk might typically // be 32K to 64K, uncompressed. If the chunk is uncompressible, then the // "stored" method is used. In this case, the bytes are simply stored as // is, eight bits per byte, with none of the above coding. The bytes are // preceded by a count, since there is no longer an EOB code. // If the data are compressible, then either the fixed or dynamic methods // are used. In the dynamic method, the compressed data are preceded by // an encoding of the literal/length and distance Huffman codes that are // to be used to decode this block. The representation is itself Huffman // coded, and so is preceded by a description of that code. These code // descriptions take up a little space, and so for small blocks, there is // a predefined set of codes, called the fixed codes. The fixed method is // used if the block ends up smaller that way (usually for quite small // chunks); otherwise the dynamic method is used. In the latter case, the // codes are customized to the probabilities in the current block and so // can code it much better than the pre-determined fixed codes can. // The Huffman codes themselves are decoded using a multi-level table // lookup, in order to maximize the speed of decoding plus the speed of // building the decoding tables. See the comments below that precede the // lbits and dbits tuning parameters. // GRR: return values(?) // 0 OK // 1 incomplete table // 2 bad input // 3 not enough memory // // // If BMAX needs to be larger than 16, then h and x[] should be unsigned long. // #define BMAX 16 /* maximum bit length of any code (16 for explode) */ #define N_MAX 288 /* maximum number of codes in any set */ // // Notes beyond the 1.93a appnote.txt: // 1. Distance pointers never point before the beginning of the output // stream. // 2. Distance pointers can point back across blocks, up to 32k away. // 3. There is an implied maximum of 7 bits for the bit length table and // 15 bits for the actual data. // 4. If only one code exists, then it is encoded using one bit. (Zero // would be more efficient, but perhaps a little confusing.) If two // codes exist, they are coded using one bit each (0 and 1). // 5. There is no way of sending zero distance codes--a dummy must be // sent if there are none. (History: a pre 2.0 version of PKZIP would // store blocks with no distance codes, but this was discovered to be // too harsh a criterion.) Valid only for 1.93a. 2.04c does allow // zero distance codes, which is sent as one code of zero bits in // length. // 6. There are up to 286 literal/length codes. Code 256 represents the // end-of-block. Note however that the static length tree defines // 288 codes just to fill out the Huffman codes. Codes 286 and 287 // cannot be used though, since there is no length base or extra bits // defined for them. Similarily, there are up to 30 distance codes. // However, static trees define 32 codes (all 5 bits) to fill out the // Huffman codes, but the last two had better not show up in the data. // 7. Unzip can check dynamic Huffman blocks for complete code sets. // The exception is that a single code would not be complete (see #4). // 8. The five bits following the block type is really the number of // literal codes sent minus 257. // 9. Length codes 8,16,16 are interpreted as 13 length codes of 8 bits // (1+6+6). Therefore, to output three times the length, you output // three codes (1+1+1), whereas to output four times the same length, // you only need two codes (1+3). Hmm. //10. In the tree reconstruction algorithm, Code = Code + Increment // only if BitLength(i) is not zero. (Pretty obvious.) //11. Correction: 4 Bits: # of Bit Length codes - 4 (4 - 19) //12. Note: length code 284 can represent 227-258, but length code 285 // really is 258. The last length deserves its own, short code // since it gets used a lot in very redundant files. The length // 258 is special since 258 - 3 (the min match length) is 255. //13. The literal/length and distance code bit lengths are read as a // single stream of lengths. It is possible (and advantageous) for // a repeat code (16, 17, or 18) to go across the boundary between // the two sets of lengths. // #define PKZIP_BUG_WORKAROUND /* PKZIP 1.93a problem--live with it */ // // inflate.h must supply the unsigned char slide[WSIZE] array, the zvoid typedef // (void if (void *) is accepted, else char) and the NEXTBYTE, // FLUSH() and memzero macros. If the window size is not 32K, it // should also define WSIZE. If INFMOD is defined, it can include // compiled functions to support the NEXTBYTE and/or FLUSH() macros. // There are defaults for NEXTBYTE and FLUSH() below for use as // examples of what those functions need to do. Normally, you would // also want FLUSH() to compute a crc on the data. // This module uses the external functions malloc() and free() (and // probably memset() or bzero() in the memzero() macro). Their // prototypes are normally found in <string.h> and <stdlib.h>. // /* #define DEBUG */ #ifndef WSIZE /* default is 32K */ # define WSIZE 0x8000 /* window size--must be a power of two, and at least */ #endif